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1 . 

SYSTEM. iOID METHOD FOR MANAGING STORAGE RESOURCES IN. A 
CLUSTERED COMPUTING ENVIRONMENT 



The present disclosure, relates ±n general to the 
field- of data storage systems and^ more particularly, tp 
a system cind method for managing storage resources in. a 
clustered computing environment. 
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Storage ar^ fiietvorks (SANs) of ten Include a 
collectj.on of data storage resources commuxiicatively 
coupled to a plurality of nodes such as vor)cstat£ons ai^d 
5 servers. In the present disclosure, the term /"node » and 
"setve±r«» are Used interchangeably; with the understanding 
thiat a "server* is one type of "node". 

-Within a SfiN, ^ server nftay access a data storage 
re^ourpe across a fabric using the Fibre Ch^miel 

10 protocol • The Fibre Channel protocol may act as a common 
physical layer that allows for the transportation of 
Ttruitiple upper layer protocols, such as the small 
cdraputer system interconnect (SCSI) protocol- In a SAN 
environment, the SCSI protocol may assign logical xinit 

15 numbers (LUNs) to the collection of data storage 

resouxTces. The LUNs may allow a server within a SAN to 
access specific data storage resources by referencing a 
SCSI IfUN for a specific data storage resource. 

Though a Fibre Cheomel storage system can offer a 

20 great deal of storage capacity^ the system can also be 
very expensive to implement. As a result, users often 
seek to share the available storage provided by the 
system among multiple servers. Unfortunately, if a 
server coupled to a given SAN uses the NICROSOPT KINDOHS 

25 NT (Trade Mark) operating system, the server may attempt to take 

ownership of any LUN visible to the server. For exaii^>le, 
if a particular server detects several LUNs when the 
serv^er boots, it may assixme each LUN is available for its 
vse. Therefore, if multiple WINDOWS NT (Trade Marl^) seivers are 

30 attached to a storage pool or a collection of data 
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0t:orage xesourtzes; eath server toay attismpt to take 
CoiHtrol ot each LTAt in the storage ^pol* This dituation 
can lead to €x>nflicts Vhen.more than one server attens>.t8 
to access the same LUN. • . 
S A user- keeking to f^alve this probliem may partition 

or ^zon^ the available storage througih filtering or 
thxoutih the use of miniport drivers that have -UJN marking 
aapabili ties . in effect , . this >:partiticmin4 may prevent . ^ 
server running WINDOWS NT (Trade Mark) from seeing storage capacity 
10 t^^t is not iassi^ed to it. This- approach may- be 

&£fectiv€i for 8tmia--a:l0ne serveris, but the appiroach.has 
•several shortcomings in a clustered computing 
environment * 

Clustering involves the configuririg of a group of 
IS independent servers so that they appear on a network as a 
single machine. Often, clusters are managed as a single 
system^ share a cpmmon namespace, and are designed 
specifically to tolerate component failures and to 
support the addition or subtraction pf conponents in a 
20 transparent manner. Unfortxmately, because a cl\ister may 
have^two or more servers that ^pear to be a single 
machine, the partitioning techniques mentipned above may 
prove an ineffective solution for avoiding conflicts when 
tJie two or more servers attemot to access the ^6ame LXIN. 

25 MICROSOFT CLUSTER SERVER (MSGS) (Trade Mark) embodies one 

currently available technique for arbitrating conflicts 
and managing ownership of storage devices in a clustered 
confuting environment. An MSGS system may operate within 
a cluster that has two servers, server A, which may be in 

30 charge, and server B. In operation, server A may pass a 
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periodic heartbeat signal to server B to let server B 
know that server A is ■alive" , If server B .does not 
receive a timely heartbeat from server A, server B. may- 
seek to deterrnine whether server A is operable and/or 
5 whether server B may take ownership of any LUNs reserved 
for server A. Unfortunately, these MSGS system may 
utilize SCSI target resets during this process, and the 
SCSI resets may create several problems. For example, a 
typical SCSI reset in the MSGS system may cause ail 

10 servers within a given Fibre Channel system to abort 
their pending input/output "J/O" processes. These 
aborted l/O processes may eventually be conpleted but not 
until the bus settles. This abort/wait/retry approach 
c^n have .a detrimental effect on overall system 

15 perfonnance. 

In addition to this potential effect on performance, 
the MSGS system and its use of SCSI resets may have a 
detrimental effect on overall system reliability. In 
operation, the MSGS system may only account for one SCSI 

20 reset at a time. The inability to account for si^sequent 
SCSI resets may lead to unexpected behavior and decrease 
system reliability. 
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. In ^ccxocdance with the present jdisqlosmre, a . system 
and method, for nianagipg storage resources in a clustered 
computing environpent are disclosed that provide 
5 significant advantages pver pripr developed techniques . 
iSie disclosed system ithd m&tbod roay allov fpSr storage 
.resource manageioent and conflict arbitration with a 
reduced reliance on SCSI resets. 

In co-pending British Patent Application, GB-A-2367162. from which the 

10 present case is divided, we disclose and claim a method for managing storage 
resources in a clustered computing environment including holding a reservation 
.is>n ia^ stoiage resourqe for a £i2:st node of the clustered 
computing environment. The node, may be, for exairple, a 
server, a workstation, or any other computing device 

15 included within the cluster. 

A third party prt>c:ess log out for the first node, may 
be {rerformed and the reservation held for the first node 
may be released. In one embodiment^ the third party 
process log out may occur in response to a log out 

20 cdnmand sent on behalf of the first node* The third 

party process log out command may be sent, for example, 
by a second node or a Fibre Channel switch. lOie third 
party process log out comnand may include identification 
infoxmation that identifies the first node as the sender 

25 of the log out command even though the first node was not 
the actual sender. The identification information may 
include, for example, a world wide name arid a so\irce 
identifier assigned to the first node. 

Managing storage resources in a clustered computing environment according 
to the disclosure in the co-pending case may additionally involve the zoning of a 
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Fibre Channel storage, system. A zone laay group a first 

node with a second node and a plurality of storage 

resources ^uch as^ hard dri,ve$ and other data storage 

devices. In the zoned system, a second node .may .log 

S itself but- af-tBr a third party process log out command 

has been issued for a. first node. After the two nodes 

aare lodged out^ a loop initialization protocol (tIP) link 

reset may be initiated, a st^te change notification may 

i?e generated, arid any functioning nodes may re^login* 
xo According to a first aspect of the present disclosure, a method 

for managing storage resources in a clustered computing environment 

includes receiving a small computer system interface (SCSI) 

reservation. coTrancmd that seeks to reserve a store^ge 

resource for a node within the cluster. In response to 

15 the reservation command, :a SCSI persistent reserve out 

.command with a service action of Reserve may ^be issued to 
reserve the storage resource for the node. This 
persistent reserve may hold a clearable reservation on 
the storage resource* In one etinbodiment, the reservation 

20 may Ije cleared by issuing a SCSI persistent reserve out . 
commaffd ^with a service action of clear. The persistent 
reserve commands may allow um reservations to be 
individually released as opposed to clearing several LUN 
reservation at once with a SCSI reset. 

25 According to a second aspect of the present 

disclosure, a computer system operable to manage storage 
resources in a clustered conputing environment may 
include a first node, a second node» and a resoiurce 

management engine operable to convert a SCSI reset command into a 
storage resource releasing command. 
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The system may also include a computer readable medium storing the 
may jS^L&o include a .<:oinputer ireadable inedlum storing the 
resource inanagement engine and a. central processing unit 
:coirantmicatively coupled to the coinputer readable medium 

5 iand operable to execute the resource management engine. 
1q one' enibojiim^t, the system may also include a 
plurality of CQTi{>uting pla):£orms cotmmmicativ^ly coupled 
to the first niode. These confuting platforms may be^ .£or 
exanpl^t a collection of networked personal conputers. 

10 The system may also include a Fibre Channel switch 
communicatively coupled to the first node and. to a 
plurality of storage resources. The Fibre Channel switch 
may, in some embodiments, include a central processing 
vnit operable to execute a resource management engine, 

15 A system and method incorporating teachings of the 

present disclosure may provide significant improvements 
Over conventional cluster resource management solutions. 
For example, the disclosed techniques may be operable to • 
better manage and arbitrate storage resource conflicts. 

20 As discussed, above, a*-SCSI reset in a clustered computing 
environment can result In the initiation of an 
ahort/wait/retry approach to several I/O processes, which 
can have a detrimental effect on overall system 
performance. The teachings of the present disclosure, may 

25 help reduce reliance on SCSI resets and the resulting 
performance degradations. 

In addition, the teachings of the present disclosure 
may facilitate the avoidance of system reliability 
problems associated with SCSI resets in a clustered 

30 computing environment. A conventional cluster resource 
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management systems such as MSGS, may be unable to account 
for SCSI resets initiated dxiring th^ bus disturbance of 
an earlier SCSI reset. This limitation may lead to 
unexpected behavidr and decrease system reliability^ 
Because the teachings of the present disclosure may 
facilitate the avoidance of at least some SCSI resets, 
system reliadbility may be inqproved. 

Other technical advantages should be appar^^ to one 
of ordinary skill in the art in view of the 
specification, claims, and drawiTigs. 
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The present invention will be described, by way of example, with 
reference to the accompanying drawings, in which: 

FIGURE 1 depicts a component diagram of a storage area network 
including one embodiment of a resource management engine that 
incorporates teachings of the present disclosure; 

FIGURE 2 shows a flow diagram for a method for managing storage 
resources in a clustered computing environment in accordance with teachings 
of GB-A-2367162; and 

FIGURE 3 shows a flow diagram for an embodiment of a method for 
managing storage resources in a clustered computing environment in 
accordance with teachings of the present disclosure. 
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FIGURE 1 depicta a general block diagram of a 
storage area network (SAN) , indicated generally at lOV 
SAN 10 includes two clustered confuting systems^ clusters 
5 12 and 14. As depicted, cluster 12 includes node 16 and 
node 18^ and cluster 14 includes node 20 and 22* Nodes 
16, 18, 20, and 22 may be, for example, servers, 
workstations, or other network confuting devices. As 
depicted in PIGORE i, cluster 12 may be supporting a 

10 number of client devices such as the client personal 
computers representatively depicted at 24 . 

SAN 10 may also include a storiige pool 26, which may 
include, for exairple, a plurality of physical storage 
devices such as hard disk drives under the control of and 

15 coupled to one or more storage controllers. The physical 
storage devices of storage pool 26 may be assigned LUNs. 
Some physical storage devices may be grouped into RAID 
volumes with each volume assigned a dingle SCSI LUN 
address. Other physical storage devices may be 

20 individually assigned one or more LUNs. However the LUNs 
are assigned, the LUNs of FIGURE 1 may map the available 
physical storage of storage pool 26 into a plujrality of 
logical storage devices and allow these logical storage 
devices to be identified and addressed, 

25 In operation, nodes 16, 18, 20, and 22 may 

corranunicate with and transfer data to and from storage 
pool 26 through fabric 28 using fibre channel protocol. 
As depicted in FIGURE l, nodes 16 and 18 may be grouped 
into zone 30 with LUN^l and LUN_2. Similarly, nodes 20 

30 and 22 may be grouped into zone 32 with LUN_3, LUN_4, and 
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LUJTS. Using switch zoning to create zone 30 may prevent 
nodes 16 and 18 from seeing lioded .20 and 22. Similarly, 
using switch zoning to create zone 32 may prevent nodes 
20 and 22 from seeing nodes 16 and 18. In addition to 
5 zoning, the errdxxiiment of FIGURE 1 may enploy liUN 
masking. LUN masking may blind a specific node or 
cluj^ter from seeing certain LIJNs. For exawpier LDN 
masking inay prevent nodes 16 and 18 from seeing hJJSJ^, 
LIOT_4, and Uttl_5 - 

10 In the embodiment of FIGURE Ir nodes 1,6, 18 # 20, and 

22 may be assigned a unique world wide name (WWN), which 
may be an eight byte identifier. The Institxxte of 
Electronics Engineers (IBBB) assigns blocks of WWNs to 
manufacturers so manufacturers can build fiber channel 

15 devices with unique WWNs. Pgr illustrative purposes, in 
the eiidb)odiment of FIGURE 1, node 16 may have a WHN of 
"AAA", node 18 may have a HWN of "BBB», node .20 may have 
a WWN of «CCC", and node 22 may have a .WWW of "DDD". As 
such, .nodes 16, 18, 20, and 22 may be uniquely 

20 identifiable by other devices coupled to fabric 28,. 

.Updes 16, 18, 20, and 22 may have identification 
information in addition to their respective WWlls. For 
example, according to the fibre channel protocol, when a 
node such as node 16 is initialized and logs into fabric 

25 28, the node is assigned a fibre channel ID. This 10 may 
be subject to change each time some initialization event 
occurs, for exainple, when another node or device logs 
into fabric 28. As depicted in FIGURE 1, fabric 28 has 
assigned fibre channel IDs as follT>ws: node 1.6 is S_ID_1, 
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node 18 is S_ID_2, node 20 is S_ID_3, cind node 22 is 
S_iD_4. 

In the embodirofflnt of FIGURE 1, the various WWNs and 
fibre channel IDs may be stored in a coni>uter readable 
medium 34, which may be accessible to devices of SAN 10. 
As shown in FIGORB 1, SAN 10 may include a computing 
dfevice 38 for establishing fabric 28. Such a computing 
device may include a CPU coramunicatively coupled to 
eon^niter readable itediuin 34. Switch 36 may also have at 
least one port 40 for interfacing with other devices to 
form an overall fibre channel network. 

In one enOxjdiBient of a system incpxporating 
teachings of the present disclosure, computing device 38 
may be operablfe tp eixecute a resource management engine, 
which way be stored ia computer readable medium 34. The 
resource management engine may be operable to perform 
several functions* For example, the resource management 
engine may be operafele to access a maintained list of the 
, WWNs and the fibre channel IDs of SAN 10 devices. In 
addition, the resource management, engine may be operable 
to recognize a SCSI reset command issued by a node and to 
convert the command into a storage resource releasing 
command . The storage resource releasing command may be, 
for example, a third party process log out or a SCSI 
persistent reserve out command with a clear action. 

In a typical MSGS cluster, a SCSI reset command may 
be issued when a node like node 18 or 20 fails to 
acknowledge receipt of a timely heartbeat 42 or 44 from a 
respective cluster mate. Heartbeats 42 and 44 may allow 
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nodes IB and 22 aspect Ively to "see" if their cluster 
mates are still functioning- . 

If, f or -exatrple^ node 18 can no longer "see" nodc^ 
16, node 18 may seek to have eny LUN reservations held 
5 for node 16 released. To acconplish this release, node 
18 may send a SCSI reset coimnand to initiate a low-level 
bus reset of the SCSI buses associated with nodes 16 and 
.18. In soine systems, for exatnple a HSCS system^ node 18 
may wait some specified amount of time before trying to 

10 . reserve the I^UNs that had been reserved by node 16. The 
waiting allows node 16 to regain control of the LUNs 
reselrved to it before the SCSI reset. As such, if node 
16 ds "alive" de^ite node 18 "s failure to receive 
heartbeat 42; node 16 may be able to re-establish its 

15 resource reservations and in so doing let node .18 know 
thisLt it i9 "alive". 

Unf ortunately^ as mentioned dbove» a SCSI reset in a 
plustered computing enviroxunent can have a detrimental 
effect on overcdl system performance and system 

20 reliability. The disclosed system and resource 

management engine may help limit a clustered confuting 
environment's reliance on 6CSI resets in several 
different ways. Bxanple techniques for avoiding SCSI 
resets may be better understood through consideration of 

25 FIGURBs 2 and 3. 

FIGURE 2 depicts a flow diagram of a method 100 for managing 
storage resources in a clustered computing environment as described 
and claimed in our co-pending application GB-A-2367162. The method 
of FIGURE 2 may be implemented by a resource management engine 
executing on a storage controller attached to SAN 
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fabric. In sonie embodiments, the resource management 
engine may be executing on a CPU associated with a switch 
like switch 36 of FIGURE 1. In other embodiments, the 
CPU may be associated with a SAN device other than the 
5 switch- For example, a resource management engine may be 
executing on one or more nodes of a SAN. 

During the operation of a SAN, a port login (PLOGI) 
coimnand may be received. As is known in the art, a PtOGI 
command is a fibre channel command wherein a node logs 

10 into a storage device attached to a SAN. A liode may 

execute a PIX)GI command after the fabric has assigned a 
fibre channel ID (S_ID) to the node. As is also 
conventionally known, the S_ID of a node may be assigned 
when a node executes a fabric login (FLOGI) command. 

15 At step 102, the S_ID and the WWN of a cluster node 

may be ^extracted. The extrraction may occur at different 
times. For exan^le, the extraction may occur when a node 
issues a PLOGI command. Once extracted, the S_ID and the 
WWN may be updated and may be stored in a con^uter 

20 readable medium. In some embodiments, this computer 
readable medium may be part of a SAN ajid may be 
accessible to several devices of the SAN. 

At step 104, a liUN reservation may be held for a 
given node. In effect, the given node may have the 

25 exclusive right to use the reserved LUN. • As is mentioned 
above, cluster nodes often communicate with one another 
using a heartbeat signal. At step 106, a SAN device may 
detect a failure to receive a timely heartbeat signal. 
Though the failure to receive a heartbeat signal may only 

30 indicate a failed communication link between the 
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heartbeat sender and the heartbeat receiver, the failure 
may result, as shown at step 108, in the deterrainatioft 
that a cluster node is inqaer^ible . 

In the embodiment of FliGORS 2, the determination 
that a node is inoperable, may cause another node to 
issue a SCSI teset, As shown at step iio, a SCSI reset 
coaniand may be sent to xeleaae LUN reservatioms held for 
the node believed to be inoperable (the "dead" npde) . At 
step 112,. the SCSI reset Conroand may be converted into a 
third party process log cjut. this conversion may, for 
exairple, be perf otued by an executing rescwrce ujanagement 
ei^ine. 

At step 114 a log out coimnand f or the -dead" node 
iBay be sent on the "dead" node's behalf by a third party. 
For example, a resource management engine may access a 
computer readable medium storing the -dead" node '9 S_ID 
and WWN, The resource management engine may use the S_ID 
and the WWH of the "dead" node to log out the "dead" "* 
node. This third party process log out may result in the 
releasing of lUH reservations held for the logged out 
node. 

As shown , at step lie of figure 2, other nodes of a 
cluster may also log out or be logged out and a loop 
initialization protocol (LIP) link reset may be 
initiated. The LIP link reset of step H8 may be 
followed by step 120 's generation of a state change 
notification, in the embodiment of FIGURE 2. the state 
change notification may cause active cluster nodes, nodes 
that are not dead, to perform a port login and to seek 
LUN reservations. The port login of active cluster nodes 
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may be seen at step 122, If the "dead" node was not 
dead« it may be abXe to regain its LUN reservatibris. If 
the "dead" node was dead, other cluster nodes may now be 
able to capture the LUM reservations held by the -dead" 
node. In effect, the storage resources held by the dead 
node will be made available to "live* nodes . — resulting 
in a better utilization of storage resources without a 
SCSI reset. 

An embodiment of a method 200 for managing storage 
resources in a clustered computing environment in accordance with the 
present invention may be seen in FIGURE 3. The method of FIGURE 
3, like the method of FIGURE 2, may be implemented by a resource 

management engine. This engine may be located at any number 
of places. For example, the exigine may be located at a 
switch, a node., or a storage control attached to a Fibre 
Channel fabiric* 

A9 shown at step method 200 may involve the 

receiving of a SCSI um reservation command. A typical 
SCSI reservation command may be cleared with a SCSI 
reset. As mentioned above, SCSI resets may cause a 
imnibet of problems within a cluster^ confuting 
environment. As such, at step 204, the SCSI reserve 
c<^Tmiand may be converted to a SCSI persistent reserve out 
cdWnand with a service action of RESERVE • The conversion 
from SCSI reserve to SCSI persistent reserve may be 
performed, for example, by an executing resoxirce 
management engine. The persistent reserve out command 
may hold a persistent LUN reservation as shown at step 
206 for the holding node, the node issuing the SCSI 
reserve command. 
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At step 208, it may be determined that the holding 
node is inoperable* In response to this .determination, a 
§CSI reset command may be issued. The SCSI reset comnaiid 
of step 210 may be converted at step 212 to a SCSI 
5 persistent reserve command with a service action of 

CLEAR. In operation, the SCSI persi^t;ent reserve conmaxui 
with a. service action of Ci:»BAR may release the LUU 
reservations held by the initial SCSI persistent reserve 
out command. The I*UN releasin9 of step 2X4 may 

10 effectively release storage resources held 1^ nodes 

determined to be inoperable at step 208 • This may resuit 
in a better utilization of storage resoiiprces within a 
clustered computing environment, and the better 
utilization may be accomplished without enplpying SCSI 

15 resets. 

Various changes to the above embodiment is 

contemplated by the present disclosure. For exanple, 
enibodiments of the present disclosvtre may be inplemented 
in SANs havixig any number of topologies. There^ may be, 
20 for example, numerous storage controliera, there may be a 
resource maiuig^ment engine ^cecuting on each node of a 
cluster,, or there may be . a single resource managein^pt 
en^ne executing within each zone of a clxistered 
coTi|>uting environment » 
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CLAIMS 

1. A method for managing storage resources in a clustered computing 
environment, the method comprising: 

receiving a small computer system interface (SCSI) reservation 
command seeking to reserve a storage resource for a node of the clustered 
computing environment; and, 

in response to the reservation command, issuing a small computer 
system interface persistent reserve out command with a service action of 
reserve to reserve the storage resource for the node. 

2. The method of Claim 1, wherein a miniport driver receives the 
reservation command and issues the persistent reserve out command. 

3. The method of Claim 1 or Claim 2, further comprising releasing a 
reservation held for the node by issuing a small computer system interface 
persistent reserve out command with a service action of clear. 

4. A computer system, comprising: 

a first node of a clustered computing environment; 

a second node of the clustered computing environment; and, 

a resource management engine operable to convert a small 
component system interface (SCSI) reset command into a storage resource 
releasing command. 

5. The system of Claim 4, wherein the resource releasing command 
comprises a third party process log out. 
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6. The system of Claim 4, wherein the resource releasing command 
comprises a small component system interface persistent reserve out 
command with a clear action. 

7. The system of any one of Claims 4 to 6, further comprising: 

5 a computer readable medium storing the resource management 

engine; and, 

a central processing unit communicatively coupled to the computer 
readable medium and operable to execute the resource management engine. 

8. The system of Claim 7, further comprising: 

10 a plurality of computing platfomris communicatively coupled to the first 

node; 

a Fibre Channel switch communicatively coupled to the first node; and, 

a plurality of storage devices communicatively coupled to the Fibre 
Channel switch. 

15 9. The system of Claim 8, wherein the Fibre Channel switch comprises 
the central processing unit 
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